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SUMMARY 

Sequencing of part of a clone from a transmissible gastroenteritis virus genome 
cDNA library led to the identification of the gene encoding the El matrix protein. The 
amino acid sequence of the primary translation product predicts a polypeptide of 262 
residues which shares many features with the previously characterized murine 
hepatitis virus and infectious bronchitis virus El proteins. However, N-terminal amino 
acid sequencing revealed that a putative signal peptide of 17 residues was absent in the 
virion-associated polypeptide. The predicted mol. wt. of the mature unglycosylated 
product, 27800, is in agreement with the experimental M x value. 

INTRODUCTION 

Transmissible gastroenteritis (TGE) is a highly contagious disease of pigs causing high 
mortality in neonates. The causal agent (TGE virus, TGEV) belongs to the family 
Coronaviridae, a group of enveloped viruses with a large, positive-stranded RNA genome (for 
review, see Siddell et al ., 1983). Coronavirus-encoded information is expressed in the cell 
through a nested set of subgenomic mRNAs with common 3 / -terminal sequences. The coding 
part of each mRNA corresponds approximately to the 5'-terminal sequences that are absent in 
the next smaller species. Mouse hepatitis virus (MHV) and infectious bronchitis virus (IBV) 
mRNAs contain at the 5' end a short non-coding sequence joined to the body sequences by 
discontinuous transcription; a consensus sequence identified at each intergenic region may act 
as a binding site for the RNA polymerase-leader complex (Spaan et al ., 1983; Brown et al., 
1984; Lai et al ., 1984; Budzilowicz et al., 1985). 

TGEV contains three major structural polypeptides: the peplomer glycoprotein E2 (200K to 
220K), which forms the distinctive surface projections, the transmembrane or matrix protein El 
(29K ± IK), and the nucleoprotein N (47K ± IK), in which a single infectious RNA molecule 
of 20 kb or more is embedded (Garwes et al., 1976; Laude et al., 1986; Brian et al., 1980). In 
TGEV-infected cells, five species of subgenomic mRNA have been characterized, in vitro 
translation of which has allowed partial coding assignment (Hu et al., 1984; Jacobs et al., 1986; 
Rasschaert et al., 1987). 

The matrix protein has been the subject of intensive studies in other coronaviruses. The N- 
terminal regions of El proteins of MHV and bovine coronavirus (BCV) bear sugar chains O- 
linked to Ser or Thr residues (Niemann & Klenk, 1981; Niemann et al., 1984), an unusual 
feature among viral glycoproteins. Also, unlike the majority of integral membrane proteins, 
both MHV and IBV El proteins lack a cleaved signal peptide (Rottier et al., 1984; Stern & 
Sefton, 1982a). The restriction of El to internal membranes seems to determine the assembly of 
the coronavirus particles in the lumen of the endoplasmic reticulum (Holmes et al., 1981; Tooze 
et al., 1985). A model of the membrane topology of El has recently emerged from a combination 
of biochemical data and analysis of its primary structure (Armstrong et al., 1984; Rottier et al ., 
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1986). The amino half is formed of a transmembrane helix with two hairpin structures, whereas 
the carboxy half is closely adjacent to the inner surface of the viral envelope and is largely 
resistant to proteolysis. At each end, two short hydrophilic segments are assumed to project from 
either side of the membrane. 

In an attempt to elucidate the functional domains of the TGEV surface glycoproteins, a 
cDNA library of the TGEV genome has been created in this laboratory. In this paper, we report 
the sequence of part of a clone encoding the El gene, present information about the N-terminal 
processing of its product and compare the features predicted from its primary structure with 
those of the other El coronavirus proteins. 

METHODS 

cDNA cloning. The preparation of cDNA clones will be described in more detail in a subsequent 
communication. cDNA was obtained from purified genomic RNA (TGEV Purdue-115 strain) using 
oligo-(T) 12 -is (Pharmacia) as primer and reverse transcriptase (Stehelin, Basel, Switzerland). RNase T2-treated 
cDNA-RNA hybrids were dC-tailed and inserted in Rrtl-cut dG-tailed pBR322 (Bethesda Research 
Laboratories) (Zain et al., 1979; Van der Werf et al., 1981). Escherichia coli RR1 cells were transfected with this 
material. An insert (5 kb) covering the complete El coding region was located in the clone pTG2.15. 

DNA sequencing. Sonicated fragments of pTG2.15 were subcloned into Smal-cut M13mpl8 vector (Deininger, 
1983). Sequencing was performed by Sanger’s dideoxy technique using [ 35 S]dATP (New England Nuclear) as the 
label and the reaction products were analysed in buffer gradient gels. Each strand of DNA was sequenced at least 
three times. 

Sequence analysis. Sequences were analysed using the program of Queen & Korn (1984), marketed as part of the 
Microgenie program (March 1985 version, Beckman), developed for the IBM PC-XT microcomputer. The 
program, utilizing the hydrophilicity values of Hopp & Woods (1981), the mean fractional area exposed values of 
Rose et al. (1985), and the flexibility values (predicted Bnorm. data) of Karplus & Schulz (1985) was written in 
Apple Basic (F. Borras-Cuesta & H. Laude, unpublished data). 

Isolation of the El polypeptide and partial amino acid analysis. Purified virion polypeptides were resolved by SDS- 
PAGE as described by Laude et al. (1986). After localization by gel slice staining, the protein band was excised 
from the gel and placed in an electroelution chamber (Isco). The protein was eluted for 24 h in 50 mM-NaHC0 3 + 
0T % SDS. The reservoir buffer was subsequently changed to 10 mM-NaHC0 3 + 0*01% SDS and electrodialysis 
was performed overnight (Hunkapiller et al., 1983). About 100 pmol of protein was subjected to N-terminal amino 
acid sequencing. Sequential Edman degradation was done on a ‘gas phase’ Applied Biosystems 470A apparatus 
with its dedicated on-line PTH amino acid analyser 120A. 

RESULTS 

A long open reading frame (ORF) of 867 bases yielding a protein with the properties of El, 
was identified on clone pTG2.15 of TGEV cDNA. The 5' end of this ORF mapped at 2*48 kb 
from the 3' end of the genome (Rasschaert et al 1987). According to its length and position, the 
El ORF corresponded to the ‘unique’ region of the mRNA 5 within the set of viral RNAs 
characterized by Northern blot analysis (data not shown). 

Inspection of the nucleotide sequence displayed in Fig. 1 revealed the presence, near each 
extremity, of two AACTAAAC sequences, which were assumed to be the start of the mRNA 
transcripts 5 (El-encoding) and 6 (N-encoding) respectively. Therefore, although the ORF 
extended 23 codons upstream from the first consensus sequence, it was postulated that initiation 
of translation on mRNA 5 should occur downstream at either of the two proximal ATGs 
available. The ATG adjacent to the consensus sequence is followed by a characteristically 
hydrophobic stretch of amino acids, which may possibly act as a signal peptide for the 
translocation of El. Alternatively, translation might start at the next ATG codon (position 184), 
yielding a product of 241 residues, still slightly larger in size than the matrix proteins of other 
coronaviruses. However, the second ATG lies in a less favourable context than the first for 
initiation of translation (Kozak, 1983). 

Partial microsequencing of the mature E1 polypeptide was performed to confirm the site of 
translation initiation (Fig. 2). The N-terminal residues thus identified were found in perfect 
agreement with the predicted sequence up to position 14, beyond which the sequencing process 
was perturbed. This led us to conclude that: (i) translation of E1 cannot be initiated at the second 
ATG codon (position 184) and (ii) a hydrophobic peptide specified by the first 17 codons of the 
gene was not present in virion-associated El. 
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Fig. 1. Part of the sequence (957 bases) from the TGEV genomic cDNA clone pTG2.15. The main 
ORF, encoding the El protein, and part of the ORF corresponding to the adjacent nucleocapsid gene 
(broken line) are translated in the three letter amino acid code. The two consensus sequences are boxed. 
Proximal ATG codons are underlined, stop codons are overlined. Dots beneath Asn residues indicate 
JV-glycosylation signals (Asn-X-Ser or Asn-X-Thr). The signal peptide-like sequence is underlined. 
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(b) Amino acid Arg Tyr (c) Ala Met Lys Ser Asp Thr Asp Leu Ser (c) Arg 

pmol PTH 90 131 115 98 107 39 66 23 34 34 39 34 

Fig. 2. Proximal amino acid data of TGEV El polypeptide, (n) Residues identified by partial N- 
terminal microsequencing of the virion protein are aligned over the predicted sequence of the El 
precursor. The uncharged region of the putative signal peptide is boxed. The arrow indicates the 
proposed site of entry of El into the lipid membrane, (b) Quantity of PTH derivative measured for the 
first 14 residues. Symbols: c, Cys suspected as no PTH derivative was detected; u, residue 
undetermined; #, as in Fig. 1. 
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DISCUSSION 

A cDNA cloned from TGEV genomic RNA was sequenced in the region corresponding to 
the coding part of mRNA 5 (Fig. 1). A single long ORF was found, which is shown to direct the 
synthesis of a coronavirus matrix-like protein. Only one additional ORF of more than 20 amino 
acids was detected, which was 43 residues long and within the El gene (position 240). Partial 
microsequencing of the virion-associated El polypeptide unambiguously established that the 
first residue was the Arg specified by the CGC codon at position 172 in Fig. 1. Hence it can be 
deduced that the single ATG codon available between the CGC codon and the upstream 
consensus sequence CTAAAC should be the functional initiation codon of mRNA 5. It predicts 
a primary translation product of 262 amino acids with mol. wt. 29*6 K, slightly higher than the 
M t 25K reported for the in vitro translation product of mRNA 5 (Jacobs et al ., 1986; referred to 
as mRNA 6 in their study). 

The first 17 residues predicted from the nucleotide sequence were found to be lacking in the 
mature protein (Fig. 2). This oligopeptide actually fulfils the criteria of an eukaryotic signal 
peptide, namely a net charge immediately after the N-terminus and high degree of hydropho- 
bicity of the 14 residue long uncharged region (McGeoch, 1985; Von Heijne, 1986). A striking 
feature is the presence of a triple Ala-Cys repeat, which also occurs in the signal peptide of rat 
oxytocin (see McGeoch, 1985). Also, the cleavage site appeared to be located between Gly and 
Arg, as predicted by the 75 to 80% accurate weight-matrix approach of Von Heijne (1986). 

These results indicate that TGEV El matures through the removal of a 17 amino acid leader 
peptide. Accordingly, the final product is 245 amino acids long, and has a predicted mol. wt. of 
27 780 in the unglycosylated form; it is basic (with five net charges at neutral pH) and 44% of the 
residues are hydrophobic. This finding is in contrast with that reported for the matrix proteins of 
two other coronaviruses, MHV and IBV (Rottier et al., 1986; Armstrong et al., 1984; Stem & 
Sefton, 1982a; Boursnell et al., 1984). It has been proposed that the matrix proteins of the latter 
are inserted into the membrane by the recognition of an internal transmembrane region as a 
signal sequence (Rottier et al., 1985). Incidentally, in the IBV El sequence (Boursnell et al., 
1984), the 22 in-frame codons between the consensus sequence and the initiation codon predict 
numerous hydrophobic residues, which might be the remnant of an ancestral signal peptide. 

Pairwise comparisons of the gene sequence of TGEV El with those of MHV and IBV at the 
DNA level revealed no significant homology. The amino acid sequences showed, in contrast, a 
remarkable homology. The homologies found by Dayoff’s optimal alignment are 38% (TGEV- 
MHV), 30% (MHV-IBV) and 27% (TGEV-IBV). The main regions of homology are shown in 
Fig. 3. An eight amino acid section is perfectly conserved among the three viruses (residues 128 
to 135, TGEV). Of the three potential membrane-spanning regions (thickly underlined), the 
second is well conserved within the MHV-IBV pair, and the third within the TGEV-MHV 
pair; only the first shows a nearly equal degree of homology within both pairs. This might be 
indicative of functional differentiation between the three segments. 

The above findings suggested that the topology of TGEV El within the membrane might be 
essentially similar to that proposed for MHV and IBV (Armstrong et ah, 1984; Boursnell et ah , 
1984; Rottier et ah, 1986). This is supported by the data presented in Fig. 4, where each profile 
corresponds to a computer-assisted prediction of the local tendency of the TGEV El 
polypeptide chain to hydrophilicity (Hopp & Woods, 1981), accessibility (Rose et ah, 1985) or 
mobility (Karplus & Schulz, 1985). By combining our data with those cited above, five regions 
can be delineated from the amino to the car boxy end: a signal peptide ( — 17 to — 1), an exposed 
glycosylated segment (1 to 29), three lipid bilayer-incorporated segments (30 to 55,66 to 86,98 to 
117), an amphiphilic C-terminal half supposedly associated to the cytoplasmic face of the 
membrane (118 to 228) and a protruding C terminus. 

Unlike the MHV and BCV matrix proteins, glycosylation of TGEV El has been reported to 
be of the JV-linked type (Garwes et ah, 1984; Jacobs et ah, 1986), as for IBV (Stem & Sefton, 
1982&). Indeed, two potential iV-glycosylation sites are available near the N terminus of the 
sequence (Fig. 2). The accessibility of the second site (Asn, 38) in the lumen of the endoplasmic 
reticulum is uncertain since the first putative membrane-spanning segment of TGEV El could 
extend virtually up to position 30, i.e, seven residues farther than the WNFS sequence, where 
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Fig. 3. Comparison of the amino acid sequences (single letter code) of the El polypeptides of TGEV, 
MHV (strain A59: Armstrong et aL, 1984) and IBV El (Beaudette strain: Boursnell et al, 1984). Dots 
denote a match between two residues; matches between the TGEV and the IBV sequences are 
indicated beneath the latter. With the simple alignment used, the homology within each El pair is 23% 
(TGEV-MHV), 26% (MHV-IBV) and 15% (TGEV-IBV). Boxed regions show homologies >60% 
between two or three sequences. Potential glycosylation sites are underlined. Thick bars indicate the 
three putative membrane-spanning segments (see text). 
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Fig. 4. Graphical output of three prediction methods applied to the amino acid sequence of TGEV El 
precursor polypeptide. ( a) Hydrophilicity profile with a running average taken over a hexapeptide. ( b) 
Exposure profile averaged as in (a ); the straight line represents the mean area exposed calculated for the 
whole El chain, (c) Flexibility profile, with calculations made on a window of seven residues. 
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the MHV spanning segment is assumed to end (see Fig. 3). In contrast the side chain of Asn at 
residue 15 may be linked to an oligosaccharide residue, as suggested by the disturbance observed 
at this point during the N-terminal sequencing (see Fig. 2). The supposition that only the first N- 
glycosylation site is functional leads to an estimated mol. wt. of 29-5K to 30K, a value close to the 
M r value of the El major species determined by electrophoresis (since the M T of a carbohydrate- 
rich mannose chain is about 2K; Klenk & Rott, 1980). Minor El species consistently observed 
as bands migrating more slowly in SDS-PAGE (Garwes & Pocock, 1975; Hu et aL , 1984; Laude 
et al ., 1986; Jacobs et aL , 1986) might reflect a heterogeneity in the oligosaccharide chain rather 
than in the polypeptide chain (for example an oversized El polypeptide produced from mRNA 
4, only 10% larger than mRNA 5). This viewpoint is supported by the fact that TGEV El 
yielded a single band after endoglycosidase H treatment (Hu et al 1984; B. Delmas & H. Laude, 
unpublished results). 

To sum up, the El protein of TGEV shares many structural features with those of MHV and 
IBV. It is becoming clear, however, that a certain diversity may exist despite the constraints that 
are necessary to achieve the distinctive architecture of a coronavirus particle. This is true at least 
for the small hydrophilic region protruding out of the particle, to which no biological function 
has been assigned so far. Our results provide substantial evidence that TGEV El undergoes N- 
terminal processing and that its exposed NH 2 extremity may be significantly larger and possibly 
more complex in secondary structure than those of IBV and MHV. Protease digestion has been 
shown to remove an external glycopeptide of nine residues (IBV; Cavanagh et al ., 1986) and 
2-5K (MHV; Rottieref a/., 1984). In comparison, the TGEV El external segment may approach 
30 residues, including four cysteines (see Fig. 2 a and 3). Recent investigations have suggested an 
involvement of TGEV El in the induction of interferon a from non-immune lymphocytes (see 
Rasschaert et al ., 1987; B. Charley & H. Laude, unpublished observation). An interesting 
possibility is raised that this previously unrecognized activity of El might proceed through the 
interaction of its NH 2 free tail with the lymphocyte surface. Additional experiments are 
currently underway in order to study this question. 
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